Have you ever wanted to share a project using IPUMS data with a colleague, but then thought, “Oh no, I can’t redistribute my IPUMS data!”

Maybe you’d like a colleague to explore your findings. Or maybe you’re a teacher with an exercise you’d like your students to review and replicate. In the past, if you wanted to someone to use the same IPUMS data that you did, you would need to provide a list of samples and variables and instructions for your collaborator on how to navigate the online data extract system.

If you’re thinking that sounds like a pain, don’t worry, the brand new IPUMS microdata API makes it easier than ever to share your extract definitions with fellow IPUMS users!!! Using the microdata API, you and your collaborators can:

dwayne johnson clapping

The latest version of ipumsr contains new functions allowing users to call on the IPUMS microdata API directly from R or RStudio. Python users should check out ipumspy. For more on the microdata API check out these other recent blog posts:

In this post, I’ll first introduce the ipumsr functions for saving an extract definition to a .json file and loading a saved definition from .json. Then, I’ll demonstrate two use-cases for those functions: sharing an analysis in an R Markdown document, and sharing an interactive application created with Shiny. Note that the code examples here will only work once you’ve requested beta access to the IPUMS microdata API by emailing ipums+api@umn.edu and set up your API key.

Sharing extracts using ipumsr

To share an extract using ipumsr, you first need an extract definition to work with. You can create a new extract definition with define_extract_usa() or define_extract_cps(). Or, if you’ve already submitted the extract, you can pull down the definition of any submitted extract with get_extract_info(). This works whether you created the extract with API functions or with the online extract system. To pull down the definition your IPUMS USA extract number 10, you would use:

Once you have your extract definition stored in a R object like extract_to_share, you can save that definition to a .json file with:

Then you can share the file extract_to_share.json with a collaborator, or in a public repository such as GitHub, and anyone with the file can submit their own identical extract request with:

Sharing an analysis in R Markdown

R Markdown is a plain-text file format that allows you to combine prose, code, and analysis output into one document. To help users share an analysis of IPUMS data in an R Markdown document, we’ve created a new R Markdown template, the “.Rmd for Reproducible Research” (RRR). You can download the template as a standalone file here, or you can install the development version of ipumsr (by following the instructions here) and access the template through the RStudio menu interface as shown below.

The beauty of the RRR is that it allows your collaborators to run your analysis out-of-the-box, without taking any separate steps to download the data. How does it accomplish this? Let’s take a look.

“hold on to your butts” meme

The first step in using the RRR workflow is to create a data extract. While it is possible to create extracts entirely within R (more on that here), many users (this author included) may want to use the online IPUMS extract system to create and submit their extracts. Once you’ve submitted your extract, take note of the extract number, then begin working with the RRR as follows.

In RStudio, select File > New File > R Markdown:

Screenshot of File menu in RStudio, with New File and R Markdown selected.

In the the popup menu, select From Template in the left sidebar, then Rmd for Reproducible Research from the list of templates, and click OK:

Screenshot of New R Markdown popup in RStudio, with From Template selected in the left sidebar, and Rmd for Reproducible Research selected from the list of templates.

Now here we are, looking at a wall of instructions:

Screenshot of the RRR R Markdown template file opened in the RStudio editor.

But don’t worry! We’ve tried to make this as painless as possible. In just a few steps you’ll have your IPUMS data downloaded and the framework for a shareable analysis project. First, scroll down to the first code chunk, labeled “project-parameters”, and fill in values for the four parameters defined there, as shown below: the IPUMS collection and extract number of your submitted extract, a descriptive name for your extract, and a subfolder in which to save your data files.

In fact, you can leave all the default values of these parameters if you want to analyze your most recent IPUMS USA extract, though I’d recommend filling in a better descriptive_name for the extract even in that case.

After filling in values, save the file, then click the RStudio “Knit” button, and awaaaaaaay it goes! All that’s left to do is sit back, relax, and –

Screenshot of error in the RStudio Render pane, reading “NOT AN ERROR: usa extract number 117 is not yet ready to download. Try re-running again later.”

…wait, is that an error???

Gif of Gimli falling down and saying “That was deliberate!” from The Lord of the Rings: The Two Towers

In fact, this template can run out of the box IPUMS USA and your most recent extract. Since this just so happens to be the extract I’d like to work with, I can proceed without making any edits, simply by clicking Knit or running rmarkdown::render().

Instead, they use the .json extract definition to create and submit a new data extract the first time the script is run. By sharing as few as two files, you can allow a colleague or student to download the exact same IPUMS data you used in order to replicate or further explore your work. We hope this helps make research more accessible and replicable. Read on to see the RRR in action, as we explore some data data from the Puerto Rican Community Survey, available from IPUMS USA.

The basic assumptions of the template are that you:

  1. Have registered with IPUMS USA (or IPUMS CPS)
  2. Have generated an IPUMS API key
  3. Have added that key to your .Renviron
  4. Have a specific dataset you want to download, analyze, and visualize
  5. Would like to let other (IPUMS users) replicate the work

For this example, we’re using IPUMS USA data, specifically looking at the Puerto Rican Community Survey from the years 2015-2019.

To get started in R, make sure to update ipumsr, then select our new RRR template.

Now here we are, looking at a (possibly) overwhelming amount of code. But don’t worry! We’ve tried to make this as painless as possible. In fact, this template can run out of the box, defaulting to IPUMS USA and your most recent extract. Since this just so happens to be the extract I’d like to work with, I can proceed without making any edits, simply by clicking Knit or running rmarkdown::render().

And awaaaaaaay it goes! All that’s left to do is sit back, relax, and -

…wait, is that an error???

gimli from lord of the rings saying “it was deliberate”

No! See, as both the ‘error message’ and our friend Gimli indicate - that’s not an error! The RRR is set up to be run/Knit a few times, at your leisure. The reason for this is to ensure that the IPUMS servers have time to process your data requests. But look, something did happen - we’ve added a subfolder named Data. And within that are two new files: a .json extract definition and a chk_....csv file. The first file contains all the information needed to get your data (or to share with friends/loved ones) and the second file, you don’t need to worry about!

Now, you may have noticed that these files are both called “template,” and you might be wondering why. This is one of the default parameters of the RRR. Users will want to edit this, which can easily be done in the first code-chunk of the RRR, depending on your window/font size, you may need to scroll. Or you can use the Table of Contents to jump down to Setup-Project Parameters. We’ll set this to a more descriptive names since our main focus will be migration rates in Puerto Rico.

Since we’ve changed the descriptive_name parameter, it’s helpful to delete the .json and chk_.csv files with the old name, “template”, before proceeding (if you set a proper descriptive name in the first place, you would not need to delete anything). With the name updated, awaaaay we knit!

And with just 2 clicks, we’ve pulled our most recent IPUMS USA data DIRECTLY into our Rproj! (I really can’t overstate how cool this feature is). You’ll notice there’s some basic descriptive information included by default. Feel free to replace these as you develop your analyses.

From here, we can fill out the remainder of the RRR with whatever analysis we’d like such as plotting migration rates over time. To check out the full features of the RRR, be sure to check out github.com/ipums/simple-api-shiny-app. Clone the repo to try out the interactive tabset .HTML report for yourself. Or check out the pre-rendered version, though many features such as code-folding are not available in this version.

This template is very much in beta, so be sure to share your feedback by emailing us at or creating an issue on GitHub. As an even-more-beta-bonus, we’ve included a simple Shiny App: the Variable Variation Value Viewer (VVVV), which uses these functions in a similar way to create a self-compiling web-app.

Sharing an interactive Shiny app

Also included in this repo is the Variable Variation Value Viewer (VVVV). This app follows the same steps as the RRR, however it also makes use of the wait_for_extract() function. Thats right, you can define, submit, wait, and download your IPUMS data all automatically…though you may be waiting a while for larger extracts. The etract used in this example is intentionall small so that users do not need to wait long (avg < 1 min) for the app to load. As mentioned above, this is not meant to be a robust, one-size-fits all app. But it does provide a petty neat way to show users what you’ve done…

And let them explore further trends…

Complete with metadata

We hope this inspires some cool new uses of IPUMS data. Happy coding and remember,

Use it for Good.